Skip to content

#437: canonical RuntimeRetention owner + log-retention alias + CLAUDE.md carveout#480

Closed
loning wants to merge 4 commits into
auto-refact-devfrom
refactor/iter437-issue-437
Closed

#437: canonical RuntimeRetention owner + log-retention alias + CLAUDE.md carveout#480
loning wants to merge 4 commits into
auto-refact-devfrom
refactor/iter437-issue-437

Conversation

@loning
Copy link
Copy Markdown
Contributor

@loning loning commented Jun 3, 2026

摘要

解决设计 issue #437(r4 consensus:structural)。

  • Old:log-retention 被扩义承载 runtime 生成物/pending-events/stale worktree 清理,operational name 与授权边界漂移。
  • New:新增 canonical RuntimeRetention owner;log-retention 降为一版 compatibility alias;清理覆盖 host-opt-in generated files + pending-events same-inode compaction + planner 证明 stale 的 .worktrees remove/prune;CLAUDE.md 加 .refactor-loop 历史生成物与陈旧 worktree 的有界清理机制 #437 narrow local-GC carveout(无 GitHub/git lifecycle authority)。

范围

新 runtime_retention.py + cli/restart + log-retention alias + CLAUDE.md/SKILL.md carveout + 新 test_runtime_retention.py(+224)+ source-regression。consensus 授权 CLAUDE.md 变更。

Closes #437

🤖 Auto-loop / codex-refactor-loop

⟦AI:AUTO-LOOP⟧

loning added 2 commits June 3, 2026 12:12
…覆盖 generated files+pending-events compaction+stale worktree;CLAUDE.md 加 #437 narrow carveout
@loning loning added crnd:lifecycle:managed loop-managed item crnd:phase:reviewing review-gate reviewers in flight crnd:human:auto auto-advancing, no human needed labels Jun 3, 2026
@loning
Copy link
Copy Markdown
Contributor Author

loning commented Jun 3, 2026

质量 review:approve

从可读性和简单性角度看,这个 PR 没有必须阻塞的问题。RuntimeRetention 入口保持线性,实际删除、pending-events compaction、retention plan 读取、worktree git recheck 都拆成小 helper;旧 log-retention 也只是兼容 alias,没有第二套 owner 或重复清理逻辑。

小备注:runtime retention 的摘要格式在 CLI summary 和 restart logging 里有两处相近拼接,但只有两处且不影响理解,未达到需要抽取的阈值。

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor Author

loning commented Jun 3, 2026

架构审查结论:reject。

PR 新增的 #437 carveout 在 CLAUDE.md/SKILL/authorization mirror 里要求 RuntimeRetention 只能删除 planner 证明 eligible 的 generated regular files。但实现里 runtime_retention.py 先调用 _delete_generated_files(repo_real, cutoff),该函数只按目录、后缀、regular-file、mtime 删除 .refactor-loop/{logs,prompts,runs} 下的文件,没有读取或校验 runtime-retention plan。对应行为测试 test_runtime_retention.py::test_deletes_only_host_opt_in_generated_regular_files_older_than_24h 也把“无 planner proof 仍删除旧文件”锁成了期望行为。

这让实际 runtime authority 比新写入的 narrow allowlist 更宽,违反 PR 自己新增的 强类型边界,窄扩展点 / #437 local-GC carveout。

需要修正为:generated file deletion 也必须消费 planner proof,或通过同一共识路径收窄 CLAUDE.md/SKILL/authorization 文案,让授权和实现一致。建议同时补一条行为测试:没有 planner eligibility 的旧 generated file 必须保留。

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor Author

loning commented Jun 3, 2026

测试视角结论:reject。

RuntimeRetention 的基础行为测试是有的,包括 opt-in/noop、generated file 删除、same-inode pending-events compaction、CLI alias。但当前缺少 destructive stale-worktree removal 的关键负向覆盖。

具体问题:

  • skills/codex-refactor-loop/scripts/codex_refactor_loop/runtime_retention.py:184 新增 missing/non-directory worktree path 拦截,但 test_runtime_retention.py:121 只覆盖了存在且 clean 的 worktree 被删除。
  • runtime_retention.py:186:189 新增 dirty / local-ahead git recheck 后才允许 git worktree remove,但测试里的 fake runner 只返回 clean status 和 0 ahead,没有证明 dirty output 或 ahead count 会阻止 remove/prune。
  • runtime_retention.py:167 拒绝 absolute path、path traversal、非 .worktrees/<name> plan path,但当前测试只覆盖了一个 false planner proof field,没有覆盖 escaping/malformed path 对 destructive cleanup 的拦截。

建议补 test_runtime_retention.py 行为测试:构造 eligible RuntimeRetentionPlan,分别断言 missing path、dirty status、positive ahead count、absolute/path traversal plan path 时 removed_worktrees == 0,不会调用 git worktree remove,也不会调用 prune。

没有看到本 PR 新增 sleep/skip/manual 测试,也没有看到明显的 assertion loosening。

⟦AI:AUTO-LOOP⟧

loning added a commit that referenced this pull request Jun 3, 2026
根因:reviewer artifact frontmatter 不带 head_sha,_review_evidence_from_artifact 取不到所审 head SHA → 所有 reviewing PR(#480/#473/#402)恒判 WAIT_OR_REDISPATCH:invalid_reviewer_evidence:missing_reviewed_head_sha,且 wakeup-plan 不自动重派 reviewer → review→merge headless 永久卡死(挡住发版)。

修法(codex hotfix worker 实现,controller 验证后提交):
- controller_actions 派 reviewer 时取 headRefOid,缺则 fail-closed,注入权威 HEAD_SHA 进 rendered prompt。
- reviewer-{architect,tests,quality}.md frontmatter 写 head_sha: ${HEAD_SHA}。
- wakeup_runner._review_head_sha_for:artifact 缺 head_sha 时从 controller-rendered prompt/log 回收(不依赖 worker 自报,避免 markerless 覆辙);live-head 陈旧判定与 merge 真值表语义不变。
- wakeup_plan 为 missing/stale reviewer head 的 OPEN reviewing PR 投射可执行 dispatch_reviewers 自愈动作。

验证:controller 独立重跑 test_wakeup_runner+review_gate+wakeup_plan+controller_actions 293 通过、marker/prompt 合同 30 通过;review-gate 安全语义(stale head 仍不 merge)由 13 项 review-gate 测试锁住。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
loning added a commit that referenced this pull request Jun 3, 2026
…merge 闭环

承接 r7:r7 让 reviewer 证据绑定 head_sha,但 wakeup_plan 投射的 review-evidence-redispatch 自愈动作被 closed-action-projection finalizer 强制 status_only(其 kind 不在 EXECUTABLE_ACTION_KINDS),runner 永不应用 → #480/#473/#402 缺 head 的 review 证据无法自动重派 → review_gate 恒 WAIT_OR_REDISPATCH → 并发耗干到 0。

修法(codex hotfix worker 实现,controller 验证后提交):
- wakeup_plan EXECUTABLE_ACTION_KINDS 加 review-evidence-redispatch,finalizer 保留 runner authority 不再强制 status_only。
- controller_actions.dispatch_reviewers 读 stale_review_roles 只重派陈旧角色,且 _pending_review_spawn_exists 跳过已有在飞 dispatch-reviewers:<pr>:<role>:r1 intent(幂等不双派);仍走 r7 的 HEAD_SHA 注入,使重派 reviewer 的新证据 head-bound。
- merge-gate 真值表不变,仅让既有自愈动作可执行 + 定向到陈旧角色。

验证:controller 独立重跑 test_wakeup_plan+wakeup_runner+review_gate+controller_actions 295 通过。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
loning added a commit that referenced this pull request Jun 3, 2026
…ady PR headless merge,backlog 能 drain

根因(已核实):wakeup_plan 每 tick 把 SPAWN actions(implement intent + design-consensus #495/#496/#490/#491/#418,index 0-5)排在 lifecycle review_gate(#480/#473/#402,index 6-8)之前;run_once 旧逻辑 `if applied_spawns>0 and not is_spawn_action: break` → 只要派了任一 spawn,首个 lifecycle 即 break 整 tick。design-consensus 几乎总有 spawn 活 → review_gate/merge 永远 starve → #473(approve/approve/comment → MERGE_WITH_COMMENTS,MERGEABLE+CI 绿)永不 merge,active 数永不下降。

修法(最小,核心 dispatch loop):
- 引入 consumes_spawn_budget = is_spawn_action or _uses_spawn_budget(action);超 budget 时 `continue`(不 break)以便仍能抵达后续 drain-only lifecycle。
- _uses_spawn_budget:dispatch_reviewers=True、review_gate 仅当决策为 FIX(会派 fix worker)=True;否则 review_gate(MERGE/MERGE_WITH_COMMENTS)、close 等 drain-only=False → 不占 spawn budget、floor 填满后仍被评估执行。
- 保留 r6 的 spawn launch-failure break/retry 与 dispatch_design_consensus no-intents skip-and-continue;真值表/head-binding/CI-green/mergeable 安全语义不变。

结果:每 tick 既用 spawn 填满 floor,又评估 merge/close 等 drain-only lifecycle,backlog 能 drain。

验证:controller 独立重跑 test_wakeup_runner+review_gate+review_gate_e2e+wakeup_plan+controller_actions+pr_checks 309 通过;新增 test_wakeup_runner_lifecycle_review_gate_not_starved_after_spawn_batch。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
loning added a commit that referenced this pull request Jun 3, 2026
… malformed — review→merge 根因

根因(实测确认):_run_command 对所有 gh 命令在 full[1:1] 插 --repo(子命令之前)。gh 不接受 --repo 作 pre-subcommand 全局 flag,且 gh api 根本不吃 --repo(repo 在 URL):`gh --repo O/R api repos/O/R/pulls/473` → rc=1 "unknown flag: --repo"。于是 wakeup-runner 的每个 gh 调用都 malformed:CI 读(PrChecksProjection(runner=_run_command) 的 gh api pulls/check-runs)→ ci_unavailable:pull_api_failure;mergeability(gh pr view)→ mergeability_unavailable;live-state 等同理。→ review gate 永远读不到 CI/mergeable → #473(approve/approve/comment=MERGE_WITH_COMMENTS,绿)、#402#480 永不 headless 落地。r11 的 retry 只是重复同一 malformed 命令。直连 PrChecksProjection()/直连 gh 用的是 plain default runner(无注入)所以正常,掩盖了 bug。

最小修法:_run_command 仅对 `gh pr`/`gh issue` 注入 --repo,且放在子命令(及其 verb)之后的合法位置;`gh api` 完全不动;绝不 pre-subcommand 注入;已存在 --repo 则不重复。

验证:controller 独立重跑 test_wakeup_runner+review_gate+review_gate_e2e+pr_checks+wakeup_plan+controller_actions 312 通过;新增 argv-shape 测试 + e2e(stub subprocess.run 拒绝 malformed `gh --repo ... api`,ready PR 仍 merge)。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@loning
Copy link
Copy Markdown
Contributor Author

loning commented Jun 3, 2026

🤖 Quality review approves PR 480

TL;DR

  • 这是什么: 我按 readability、命名、简单性、死代码和 refactor self-doc policy 检查了 PR 480。
  • 结论是什么: approve, 成为唯一主实现, 只是同 handler 的兼容别名, 没看到质量阻塞问题。
  • controller 下一步: 可把这条 quality 结果纳入 review truth table, 不需要 maintainer 处理 quality 项。

详细说明

重点看了 这个 skill-private cleanup 的主实现、 这个旧命令兼容入口、 的命令注册, 以及 的调用点。新代码把文件清理、pending-events 原地压缩、planner-proven worktree 清理拆成私有 helper, 每个函数尺寸和分支数都在可读范围内。

只是兼容别名, 没保留第二套清理逻辑; 把 注册为 canonical command, 复用同一个 handler; 在 daemon freshness checks 前调用 。这些路径都有对应测试锁住, 没发现新死代码、单实现多余抽象、无意义注释或离题改动。默认 , diff 里也没有新增 source-code 自文档注释。


📎 完整 codex 原始输出(存档备查)

pr: 480
role: quality
head_sha: 9c05257
verdict: approve

Verdict

Approve: the RuntimeRetention split is readable, focused, reachable from both canonical and compatibility command paths, and I found no quality regressions under the default none refactor self-doc policy.

Evidence

  • skills/codex-refactor-loop/scripts/codex_refactor_loop/runtime_retention.py:42 keeps the public entry point as retain_runtime(...) with a small orchestration body and private helpers for file deletion, pending-event compaction, planner parsing, path eligibility, and git verification; no single new method crosses the preferred size/branch threshold.
  • skills/codex-refactor-loop/scripts/codex_refactor_loop/retention.py:1 is a minimal compatibility alias that delegates to runtime_retention_main, so the legacy log-retention surface does not grow a second retention implementation.
  • skills/codex-refactor-loop/scripts/codex_refactor_loop/cli.py:170 registers runtime-retention as canonical and log-retention as the same-handler alias, and test_cli_command_router.py:375 asserts the handler and authority remain identical.
  • skills/codex-refactor-loop/scripts/codex_refactor_loop/restart.py:345 calls retain_runtime(...) before daemon freshness checks, while test_restart_daemons.py:85 and test_runtime_retention.py:215 keep that call path covered.
  • skills/codex-refactor-loop/scripts/test_log_retention.py:37 narrows the legacy test to alias behavior instead of preserving dead retain_logs expectations.
  • git diff --check origin/auto-refact-dev...origin/refactor/iter437-issue-437 returned clean, and the diff adds no new source-code Refactor (...) / Old pattern / New principle comments outside the documented/tested policy surfaces.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:480:quality:approve

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor Author

loning commented Jun 3, 2026

🤖 架构审查结论: reject

TL;DR


详细说明

RuntimeRetention 是本 PR 新增的运行态清理 owner,负责清理 skill-private 的 .refactor-loop 生成物和 stale worktree。问题在于 PR 自己写入的 CLAUDE.md #437 carveout 说 .refactor-loop/{logs,prompts,runs} 内 generated regular files 只有在 planner 证明 eligible 后才能删除,但 runtime_retention.py 当前按目录、后缀、mtime 直接删旧文件。

具体证据:skills/codex-refactor-loop/scripts/codex_refactor_loop/runtime_retention.py:56 先调用 _delete_generated_files(...),而 planner artifact .refactor-loop/state/runtime-retention-plan.json 只在后面的 worktree cleanup 分支读取。runtime_retention.py:67runtime_retention.py:94logs/prompts/runs 里的 .json/.log/.md/.txt old regular file 直接 unlink。这和 CLAUDE.md:40#437 文字不一致:该条只授权删除 planner 证明 eligible 的 generated regular files。架构上这是 blocking mismatch,因为 controller-runtime 例外必须 narrow allowlist + durable artifact 双锚定,不能让代码路径比授权文本更宽。

修复路径很窄:让 generated-file cleanup 也消费 durable planner proof,并补测试证明“old generated file without planner proof is kept”;或者重新收窄/改写 #437 授权文本,但那需要和当前 carveout 同等级的共识与 source-regression。


📎 完整 codex 原始输出(存档备查)

pr: 480
role: architect
head_sha: 9c05257
verdict: reject

Verdict

reject - RuntimeRetention deletes generated .refactor-loop/{logs,prompts,runs} files by age/suffix without the planner eligibility proof required by the new #437 carveout.

Evidence

  • skills/codex-refactor-loop/scripts/codex_refactor_loop/runtime_retention.py:56-59 calls _delete_generated_files(repo_real, cutoff) before reading .refactor-loop/state/runtime-retention-plan.json; runtime_retention.py:67-94 then deletes any old regular file in logs, prompts, or runs with suffix .json, .log, .md, or .txt via path.unlink(missing_ok=True). This violates the PR's own CLAUDE.md:40 .refactor-loop 历史生成物与陈旧 worktree 的有界清理机制 #437 clause: "#437 是唯一 skill-private runtime-retention local-GC carveout:active-controller owner 的 checked-in RuntimeRetentionhelper 只可在 host opt-in 后删除$REPO_ROOT/.refactor-loop/{logs,prompts,runs}内 planner 证明 eligible 的 generated regular files、same-inode compact$REPO_ROOT/.refactor-loop/.controller-pending-events.log、以及对 $REPO_ROOT/.worktrees/中 no in-flight/no OPEN issue-or-PR/no dirty/no local-ahead/merged-or-missing-safe 的 stale worktree 运行git worktree remove git worktree prune;允许 GitHub only read OPEN issue/PR/head state;允许 git only read projection plus git worktree remove/git worktree prune;禁止 git fetch、branch deletion、commit/push/reset/rebase/merge/tag、issue/PR create/edit/close/merge、label mutation、tag/release、archive/index durable fact source、generic lifecycle actor。"
  • The same mismatch appears in the skill contract: skills/codex-refactor-loop/SKILL.md:309 says "only when $RUNTIME_RETENTION_ENABLE=true, delete planner-eligible generated regular files under $REPO_ROOT/.refactor-loop/{logs,prompts,runs} older than the retention TTL", but runtime_retention.py:71-89 has no per-file planner eligibility input or proof check. This also regresses CLAUDE.md:43: "变更必须可验证:行为约束必须落到机械验证手段(behavior test / source-regression test / 段落 lint);仅靠\"agent 应该记得\"承载的约束视为未落地。" The added behavior test test_runtime_retention.py locks the broad age/suffix deletion behavior instead of the stated planner-proof behavior.

What would change your verdict

Make generated-file cleanup consume a durable planner artifact or equivalent explicit plan/proof for .refactor-loop/{logs,prompts,runs} entries before unlinking, and update behavior tests/source-regression so an old generated file without planner eligibility proof is kept. Alternatively, narrow the #437 CLAUDE/SKILL/authorization text to honestly authorize age/suffix-based generated-file deletion, but that would need the same repo-level narrow-allowlist discipline and consensus basis as the current carveout.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:480:architect:reject

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor Author

loning commented Jun 3, 2026

🤖 Quality review approves PR 480

TL;DR

  • 这是什么: 我按 readability、命名、简单性、死代码和 refactor self-doc policy 检查了 PR 480。
  • 结论是什么: approve, RuntimeRetention 是运行时清理的唯一主实现, log-retention 只是同 handler 的兼容别名。
  • controller 下一步: 可把这条 quality 结果纳入 review truth table, 不需要 maintainer 处理 quality 项。

详细说明

重点看了 runtime_retention.py 这个 skill-private cleanup 的主实现、retention.py 这个旧命令兼容入口、cli.py 的命令注册, 以及 restart.py 的调用点。新代码把文件清理、pending-events 原地压缩、planner-proven worktree 清理拆成私有 helper, 每个函数尺寸和分支数都在可读范围内。

retention.py:1 只是兼容别名, 没保留第二套清理逻辑; cli.py:170runtime-retention 注册为 canonical command, log-retention 复用同一个 handler; restart.py:345 在 daemon freshness checks 前调用 retain_runtime(...)。这些路径都有对应测试锁住, 没发现新死代码、单实现多余抽象、无意义注释或离题改动。默认 HOST_REFACTOR_COMMENT_POLICY=none, diff 里也没有新增 source-code Refactor (...) 自文档注释。


📎 完整 codex 原始输出(存档备查)

pr: 480
role: quality
head_sha: 9c05257
verdict: approve

Verdict

Approve: the RuntimeRetention split is readable, focused, reachable from both canonical and compatibility command paths, and I found no quality regressions under the default none refactor self-doc policy.

Evidence

  • skills/codex-refactor-loop/scripts/codex_refactor_loop/runtime_retention.py:42 keeps the public entry point as retain_runtime(...) with a small orchestration body and private helpers for file deletion, pending-event compaction, planner parsing, path eligibility, and git verification; no single new method crosses the preferred size/branch threshold.
  • skills/codex-refactor-loop/scripts/codex_refactor_loop/retention.py:1 is a minimal compatibility alias that delegates to runtime_retention_main, so the legacy log-retention surface does not grow a second retention implementation.
  • skills/codex-refactor-loop/scripts/codex_refactor_loop/cli.py:170 registers runtime-retention as canonical and log-retention as the same-handler alias, and test_cli_command_router.py:375 asserts the handler and authority remain identical.
  • skills/codex-refactor-loop/scripts/codex_refactor_loop/restart.py:345 calls retain_runtime(...) before daemon freshness checks, while test_restart_daemons.py:85 and test_runtime_retention.py:215 keep that call path covered.
  • skills/codex-refactor-loop/scripts/test_log_retention.py:37 narrows the legacy test to alias behavior instead of preserving dead retain_logs expectations.
  • git diff --check origin/auto-refact-dev...origin/refactor/iter437-issue-437 returned clean, and the diff adds no new source-code Refactor (...) / Old pattern / New principle comments outside the documented/tested policy surfaces.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:480:quality:approve

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor Author

loning commented Jun 3, 2026

🤖 Tests review: reject,RuntimeRetention 删除保护缺测试

TL;DR

  • 这是什么: PR 480 的 tests 角度独立 review。
  • 现在到哪一步 / 结论是什么: reject,因为新 RuntimeRetention 本地清理逻辑缺少 dirty/ahead 保护分支测试。
  • 需要 maintainer 做什么 OR controller 下一步: controller 应派 fix worker 补齐这些删除前 fail-closed 行为测试。

详细说明

RuntimeRetention 是本 PR 新增的 skill-private runtime cleanup owner,会在 host opt-in 后删除 .refactor-loop 生成物并移除 stale worktree。现有测试覆盖了干净 worktree 的移除路径和 planner proof 为 false 的跳过路径,但没有覆盖真正阻止破坏性删除的 git recheck 分支: git status --porcelain 有输出或失败时必须跳过,git rev-list --count @{upstream}..HEAD 返回正数时也必须跳过。

这属于 net-new destructive local-GC authority 的覆盖缺口,不是 nice-to-have。目标修复很小: 在 test_runtime_retention.py 增加 planner-eligible worktree 用例,分别模拟 dirty、status failure、local-ahead,并断言 removed_worktrees == 0,且没有对该 worktree 调 git worktree remove / git worktree prune


📎 完整 codex 原始输出(存档备查)
---
pr: 480
role: tests
head_sha: 9c052577d132e72c1acc0642a6192a09c488c220
verdict: reject
---

## Verdict
Reject: the new RuntimeRetention tests cover the happy worktree-removal path, but they do not cover the dirty/ahead git recheck branches that prevent destructive `git worktree remove`.

## Evidence
- `skills/codex-refactor-loop/scripts/codex_refactor_loop/runtime_retention.py:186` and `:189` add the real safety rechecks before deletion: `git status --porcelain` must be clean and `git rev-list --count @{upstream}..HEAD` must not report local-ahead commits. These are net-new branches on a destructive cleanup path.
- `skills/codex-refactor-loop/scripts/test_runtime_retention.py:121` only exercises one eligible clean worktree and one planner-proof-false skip. Its fake runner always returns clean status and `0` ahead for the eligible worktree, so no test proves RuntimeRetention skips `git worktree remove` when the local recheck reports dirty files or ahead commits.
- Positive coverage is otherwise present: generated-file TTL deletion, symlink/FIFO preservation, same-inode pending-event compaction, host opt-in/noop behavior, CLI summary, log-retention aliasing, command authority source-regression, and runtime-exception source-regression are covered. Targeted PR-head verification passed with `cd skills/codex-refactor-loop/scripts && python3 -m unittest test_runtime_retention.py test_log_retention.py test_cli_command_router.py test_runtime_exception_authorization_sources.py` (`68 tests`, OK).

## What would change your verdict (only if comment or reject)
Add behavior tests in `test_runtime_retention.py` that create planner-eligible stale worktree entries and assert `git worktree remove` is not invoked when `git status --porcelain` returns output, when the status command fails, when `rev-list --count @{upstream}..HEAD` returns a positive count, and ideally when the worktree path is missing. These should assert `removed_worktrees == 0` and no prune/remove commands were issued for the blocked worktree.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:480:tests:reject

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor Author

loning commented Jun 4, 2026

🤖 Tests review 结论: reject

TL;DR

  • 这是什么: PR 480 的测试覆盖与测试质量审查。
  • 现在到哪一步 / 结论是什么: 发现新 RuntimeRetention 的 worktree 删除路径缺少 fail-closed 负向测试, verdict 是 reject。
  • controller 下一步: 请实现方补齐这些负向测试后重新进入 review。

详细说明

RuntimeRetention 是新的运行时保留清理器,会在 host opt-in 后清理 .refactor-loop 下的生成物,并可按计划移除 stale worktree。现有测试覆盖了正向删除、symlink/FIFO 保留、pending-events 同 inode 压缩、CLI 输出和 source-regression anchor,这些都不是问题。

阻塞点在 skills/codex-refactor-loop/scripts/codex_refactor_loop/runtime_retention.py:126: 这里会在 _git_verification_passes(...) 通过后执行 git worktree removeskills/codex-refactor-loop/scripts/test_runtime_retention.py:121 只覆盖了 happy path 和一个 planner proof 为 false 的条目,没有覆盖 dirty worktree、local ahead、git recheck non-zero、path missing 这些应当阻止删除的负向分支。因为这是 net-new destructive logic,测试角度必须 reject。

需要补的测试很具体: 复用现有 fake command_runner,分别让 git status --porcelain 返回脏输出、rev-list --count @{upstream}..HEAD 返回正数、git recheck 返回 non-zero、eligible path 不存在,并断言没有记录任何 git worktree remove 调用。


📎 完整 codex 原始输出(存档备查)

pr: 480
role: tests
head_sha: 9c05257
verdict: reject

Verdict

Reject: test coverage is generally behavior-oriented, but the new destructive worktree-removal path lacks negative coverage for the git recheck gates.

Evidence

  • skills/codex-refactor-loop/scripts/codex_refactor_loop/runtime_retention.py:126 only removes a planned stale worktree after _git_verification_passes(...); test_stale_worktree_requires_planner_proof_and_git_recheck covers the happy path plus one bad planner proof, but no test exercises the fail-closed outcomes where git status --porcelain reports dirty content, the upstream-ahead check returns a positive count, git returns non-zero, or the eligible path directory is missing. This is net-new destructive logic that invokes git worktree remove, so negative guard coverage is required before merge.
  • skills/codex-refactor-loop/scripts/test_runtime_retention.py:72, :92, :107, :121, and :170 are meaningful behavior tests for opt-in deletion, symlink/FIFO preservation, same-inode event compaction, planner-proof worktree removal, and CLI summary output. I did not find skip/manual markers, assertion loosening, or test pacing via real sleeps in the PR diff.
  • skills/codex-refactor-loop/scripts/test_runtime_retention.py:192 adds source-regression assertions for the new RuntimeRetention contract and forbidden tokens, which matches the repository rule that runtime exceptions need both behavior and source-regression coverage.

What would change your verdict

Add deterministic behavior tests for the worktree-removal fail-closed gates: dirty worktree output prevents git worktree remove, positive local-ahead count prevents removal, non-zero git recheck prevents removal, and missing worktree path prevents removal. These can reuse the existing fake command_runner and assert that no ("worktree", "remove", ...) command is recorded.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:480:tests:reject

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor Author

loning commented Jun 4, 2026

🤖 架构审查结论: reject

TL;DR

  • 这是什么: PR #437: canonical RuntimeRetention owner + log-retention alias + CLAUDE.md carveout #480 的 architect 角度审查,只看 CLAUDE.md/AGENTS.md 架构合规。
  • 现在到哪一步 / 结论是什么: reject,因为新 RuntimeRetention helper 的多个失败/跳过路径静默返回,违反“失败/skip/fail-closed 必须可诊断日志或异常”的仓库规则。
  • controller 下一步: 请 implement codex 补一条窄诊断面,让每个 runtime-retention skip/fail-closed 分支输出包含目标和原因的单行信息后再复审。

详细说明

RuntimeRetention 是这轮新增的本地运行态清理 owner,负责 .refactor-loop 生成物和 stale worktree 清理。它的 #437 carveout 边界本身看起来是窄的:host opt-in、只碰 skill-private runtime state、禁止 GitHub 写和 git lifecycle 扩权。但实现里有几类静默跳过:文件删除 OSError 被吞掉、pending-events compaction 失败直接返回 False、retention plan 读失败/JSON 损坏直接当空计划、stale worktree 候选不合格时没有逐项原因。这会让外部状态面无法区分“安全保留”和“清理逻辑坏了”。

阻塞条款来自 CLAUDE.md 工程约定:任何错误、失败、fail-closed、skip 都必须抛出异常或写一条可 grep 的单行日志,包含目标 id、原因和关键计数。修复不需要扩权:可以在现有 summary/log 输出里加 bounded warning list/count,例如 runtime_retention_skip target=<path> reason=<reason>;不要新增 archive/index durable fact source,也不要扩大 #437 的 git/GitHub allowlist。


📎 完整 codex 原始输出(存档备查)
---
pr: 480
role: architect
head_sha: 9c052577d132e72c1acc0642a6192a09c488c220
verdict: reject
---

## Verdict
reject: RuntimeRetention introduces silent fail-closed/skip paths, which violates the repository requirement that failures, skips, and fail-closed decisions be diagnosable.

## Evidence
- `skills/codex-refactor-loop/scripts/codex_refactor_loop/runtime_retention.py:92`: `_delete_generated_files()` catches `OSError` and silently keeps the file. This is a cleanup skip/failure with no target id or reason, violating `CLAUDE.md` 工程约定: “任何错误、失败、fail-closed、skip、WAIT、budget 耗尽、分支不可达都必须抛出异常或写一条**可诊断、可 grep 的单行日志**(含目标 id、原因、关键计数),让问题在外部状态面立即可见。**禁止**`except`/`except: pass`、裸吞错、静默 `continue`/`return`、不记原因的跳过。”
- `skills/codex-refactor-loop/scripts/codex_refactor_loop/runtime_retention.py:100`: `_compact_pending_events()` swallows `OSError` and returns `False`, making event-log compaction failure indistinguishable from “not needed”. Same `CLAUDE.md` clause applies because this is a fail-closed/skip path without a diagnosable reason.
- `skills/codex-refactor-loop/scripts/codex_refactor_loop/runtime_retention.py:137`: `_read_retention_plan()` catches `OSError` and `JSONDecodeError` and returns an empty plan. That silently disables stale worktree cleanup even when the durable plan exists but is unreadable/corrupt, again violating the same `CLAUDE.md` diagnostic logging clause.
- `skills/codex-refactor-loop/scripts/codex_refactor_loop/runtime_retention.py:123`: stale worktree candidates rejected by `_eligible_worktree_path()` or `_git_verification_passes()` are skipped with no per-worktree reason. The #437 carveout in `CLAUDE.md` says RuntimeRetention may remove only planner-proven stale worktrees, and the general logging clause requires skipped/fail-closed worktree candidates to expose a grep-able reason.

## What would change your verdict
Add a narrow diagnostic surface to RuntimeRetention so every failure/skip/fail-closed branch writes or returns a single-line reason containing the affected path/worktree and cause, then ensure `main()` and `restart-daemons` surface those reasons. Do not widen the #437 allowlist or add a durable archive/index fact source; this can be a bounded warning list/count in the existing summary/log output.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:480:architect:reject

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor Author

loning commented Jun 4, 2026

🤖 Quality review: reject, retention failure log loses the reason

TL;DR

  • 这是什么: PR 480 的 readability/simplicity 质量审查。
  • 结论: reject, 主要问题是 RuntimeRetention 失败时 restart helper 只写泛化 warning, 不写异常类型/消息。
  • 下一步: controller/作者把异常 detail 写进单行日志后, 这个质量结论可以改为 approve。

详细说明

是本 PR 新增的本地 runtime 清理入口, 会删除 skill-private generated files, 也可能按 plan 移除 stale worktree。 现在 catch 住所有 , 但 只输出 , 没有 之类的异常 detail。结果是 cleanup 失败、path guard 失败、git recheck 失败等都会变成同一行日志, 排查时没有目标原因。

建议和旁边 的写法一致: 后在同一行 warning 中包含 。这个改动很小, 不改变控制流, 但能让失败日志可 grep、可诊断。


📎 完整 codex 原始输出(存档备查)

pr: 480
role: quality
head_sha: 9c05257
verdict: reject

Verdict

Reject: the RuntimeRetention split is otherwise readable and focused, but the new restart integration swallows retention failures without the exception detail needed to diagnose local cleanup problems.

Evidence

  • skills/codex-refactor-loop/scripts/codex_refactor_loop/runtime_retention.py:42 keeps the public entry point as retain_runtime(...) with a small orchestration body and private helpers for file deletion, pending-event compaction, planner parsing, path eligibility, and git verification; no single new method crosses the preferred size/branch threshold.
  • skills/codex-refactor-loop/scripts/codex_refactor_loop/retention.py:1 is a minimal compatibility alias that delegates to runtime_retention_main, so the legacy log-retention surface does not grow a second retention implementation.
  • skills/codex-refactor-loop/scripts/codex_refactor_loop/cli.py:170 registers runtime-retention as canonical and log-retention as the same-handler alias, and test_cli_command_router.py:375 asserts the handler and authority remain identical.
  • skills/codex-refactor-loop/scripts/codex_refactor_loop/restart.py:345 calls retain_runtime(...) before daemon freshness checks, while test_restart_daemons.py:85 and test_runtime_retention.py:215 keep that call path covered.
  • skills/codex-refactor-loop/scripts/test_log_retention.py:37 narrows the legacy test to alias behavior instead of preserving dead retain_logs expectations.
  • skills/codex-refactor-loop/scripts/codex_refactor_loop/restart.py:348 catches every Exception from RuntimeRetention and line 349 logs only runtime_retention warning: helper failed; continuing daemon restart. That hides the exception type/message for a helper that deletes generated files and can remove worktrees, making a failed cleanup indistinguishable from any other failure during daemon restart.
  • git diff --check origin/auto-refact-dev...origin/refactor/iter437-issue-437 returned clean, and the diff adds no new source-code Refactor (...) / Old pattern / New principle comments outside the documented/tested policy surfaces.

What would change your verdict (only if comment or reject)

Log the exception detail in RestartDaemons._run_runtime_retention, matching the nearby update-check pattern, for example except Exception as exc: followed by a one-line warning that includes {exc!r}.

⟦AI:AUTO-LOOP⟧

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor Author

loning commented Jun 4, 2026

🤖 Quality review: reject, retention failure log loses the reason

TL;DR

  • 这是什么: PR 480 的 readability/simplicity 质量审查。
  • 结论: reject, 主要问题是 RuntimeRetention 失败时 restart helper 只写泛化 warning, 不写异常类型/消息。
  • 下一步: controller/作者把异常 detail 写进单行日志后, 这个质量结论可以改为 approve。

详细说明

上一条同标题评论的正文被 shell 反引号展开破坏了, 以这条为准。RuntimeRetention 是本 PR 新增的本地 runtime 清理入口, 会删除 skill-private generated files, 也可能按 plan 移除 stale worktree。skills/codex-refactor-loop/scripts/codex_refactor_loop/restart.py:348 现在 catch 住所有 Exception, 但 restart.py:349 只输出 runtime_retention warning: helper failed; continuing daemon restart, 没有 {exc!r} 之类的异常 detail。结果是 cleanup 失败、path guard 失败、git recheck 失败等都会变成同一行日志, 排查时没有目标原因。

建议和旁边 _run_update_check 的写法一致: except Exception as exc: 后在同一行 warning 中包含 {exc!r}。这个改动很小, 不改变控制流, 但能让失败日志可 grep、可诊断。


📎 完整 codex 原始输出(存档备查)

pr: 480
role: quality
head_sha: 9c05257
verdict: reject

Verdict

Reject: the RuntimeRetention split is otherwise readable and focused, but the new restart integration swallows retention failures without the exception detail needed to diagnose local cleanup problems.

Evidence

  • skills/codex-refactor-loop/scripts/codex_refactor_loop/runtime_retention.py:42 keeps the public entry point as retain_runtime(...) with a small orchestration body and private helpers for file deletion, pending-event compaction, planner parsing, path eligibility, and git verification; no single new method crosses the preferred size/branch threshold.
  • skills/codex-refactor-loop/scripts/codex_refactor_loop/retention.py:1 is a minimal compatibility alias that delegates to runtime_retention_main, so the legacy log-retention surface does not grow a second retention implementation.
  • skills/codex-refactor-loop/scripts/codex_refactor_loop/cli.py:170 registers runtime-retention as canonical and log-retention as the same-handler alias, and test_cli_command_router.py:375 asserts the handler and authority remain identical.
  • skills/codex-refactor-loop/scripts/codex_refactor_loop/restart.py:345 calls retain_runtime(...) before daemon freshness checks, while test_restart_daemons.py:85 and test_runtime_retention.py:215 keep that call path covered.
  • skills/codex-refactor-loop/scripts/test_log_retention.py:37 narrows the legacy test to alias behavior instead of preserving dead retain_logs expectations.
  • skills/codex-refactor-loop/scripts/codex_refactor_loop/restart.py:348 catches every Exception from RuntimeRetention and line 349 logs only runtime_retention warning: helper failed; continuing daemon restart. That hides the exception type/message for a helper that deletes generated files and can remove worktrees, making a failed cleanup indistinguishable from any other failure during daemon restart.
  • git diff --check origin/auto-refact-dev...origin/refactor/iter437-issue-437 returned clean, and the diff adds no new source-code Refactor (...) / Old pattern / New principle comments outside the documented/tested policy surfaces.

What would change your verdict (only if comment or reject)

Log the exception detail in RestartDaemons._run_runtime_retention, matching the nearby update-check pattern, for example except Exception as exc: followed by a one-line warning that includes {exc!r}.

⟦AI:AUTO-LOOP⟧

⟦AI:AUTO-LOOP⟧

loning added 2 commits June 5, 2026 15:22
…ase)

# Conflicts:
#	CLAUDE.md
#	skills/codex-refactor-loop/scripts/test_comment_monitor.py
#	skills/codex-refactor-loop/scripts/test_log_retention.py
#	skills/codex-refactor-loop/scripts/test_restart_daemons.py
#	skills/codex-refactor-loop/scripts/test_runtime_exception_authorization_sources.py
#	skills/codex-refactor-loop/scripts/test_skill_reference_anchors.py
@loning
Copy link
Copy Markdown
Contributor Author

loning commented Jun 5, 2026

🤖 架构审查结论:reject,RuntimeRetention 仍有静默 skip

TL;DR

  • 这是什么: PR 480 的 architect / CLAUDE.md 合规复审。
  • 现在到哪一步 / 结论是什么: reject,因为 RuntimeRetention,即运行态生成物清理 helper,仍把失败和跳过吞成无诊断结果。
  • controller 下一步: 请 implement codex 补齐每个 fail-closed / skip 分支的单行可 grep 诊断后再送审。

详细说明

这轮复审看的是 head 5e5bb1e3d2a03208e2cd8ff0652456b6eef36c4e。问题集中在 skills/codex-refactor-loop/scripts/codex_refactor_loop/runtime_retention.py: OSError、损坏的 RuntimeRetentionPlan,即 stale worktree 清理计划,以及 worktree gate 失败都会被转换成 keptFalse、空 plan 或直接 continue,外部只看到“没清理”,看不到哪个路径、哪个 worktree、为什么跳过。

这违反 CLAUDE.md 的硬规则: “任何错误、失败、fail-closed、skip、WAIT、budget 耗尽、分支不可达都必须抛出异常或写一条可诊断、可 grep 的单行日志(含目标 id、原因、关键计数),让问题在外部状态面立即可见。禁止except/except: pass、裸吞错、静默 continue/return、不记原因的跳过。” 这不是测试覆盖问题,而是新 runtime authority 的诊断边界问题,所以 architect 角度必须 reject。


📎 完整 codex 原始输出(存档备查)
---
pr: 480
role: architect
head_sha: 5e5bb1e3d2a03208e2cd8ff0652456b6eef36c4e
verdict: reject
---

## Verdict
reject: RuntimeRetention still introduces silent fail-closed/skip paths, so merge would regress the CLAUDE.md diagnostic contract.

## Evidence
- `skills/codex-refactor-loop/scripts/codex_refactor_loop/runtime_retention.py:92`: `_delete_generated_files()` catches `OSError` and silently increments `kept`, with no path/reason in stdout, stderr, or returned diagnostics. This violates `CLAUDE.md` 工程约定: “任何错误、失败、fail-closed、skip、WAIT、budget 耗尽、分支不可达都必须抛出异常或写一条**可诊断、可 grep 的单行日志**(含目标 id、原因、关键计数),让问题在外部状态面立即可见。**禁止**`except`/`except: pass`、裸吞错、静默 `continue`/`return`、不记原因的跳过。”
- `skills/codex-refactor-loop/scripts/codex_refactor_loop/runtime_retention.py:100`: `_compact_pending_events()` catches `OSError` and returns `False`, making event-log read failure indistinguishable from “compaction not needed”. Same `CLAUDE.md` clause applies because this is a fail-closed runtime maintenance skip without a diagnosable line.
- `skills/codex-refactor-loop/scripts/codex_refactor_loop/runtime_retention.py:137`: `_read_retention_plan()` catches `OSError` / `JSONDecodeError` and returns an empty plan, silently disabling stale worktree cleanup when the plan is unreadable or corrupt. Same `CLAUDE.md` clause applies because malformed planner proof must be visible, not converted to “no work”.
- `skills/codex-refactor-loop/scripts/codex_refactor_loop/runtime_retention.py:123` and `:126`: stale worktree candidates rejected by `_eligible_worktree_path()` or `_git_verification_passes()` are skipped with no worktree id and no gate reason. The new #437 carveout in `CLAUDE.md` authorizes only planner-proven local GC, and the general logging clause requires skipped/fail-closed candidates to expose a grepable reason.

## What would change your verdict
Add a narrow diagnostic surface to `RuntimeRetention` so every failure/skip/fail-closed branch either raises a specific exception or emits/returns a single-line reason containing the affected path/worktree and cause, then make `main()` and `restart-daemons` surface those reasons. Keep the #437 allowlist narrow: no archive/index durable fact source, no new GitHub write, and no generic cleanup framework.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:480:architect:reject

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor Author

loning commented Jun 5, 2026

🤖 测试审查结论: reject

TL;DR

  • 这是什么: PR 480 的测试质量审查,只看测试覆盖与测试本身是否可靠。
  • 现在到哪一步 / 结论是什么: reject,因为新增的 RuntimeRetention CLI 测试在 PR head 上实际失败。
  • controller 下一步: 需要修正测试 fixture 的 host.env 路径后重新跑对应测试。

详细说明

我在提取出的 PR head 上跑了 , 结果 9 个测试中 2 个失败。失败点是 把测试用的 和 写进 , 但当前 合同明确不读取这个 legacy path。 启动 CLI 时也没有设置 , 所以 CLI 按预期 fail closed, 报 。

修法很窄: 测试 fixture 应该创建 , 并在 的 env 中传 。我没有看到新增 /skip/manual marker, 也没有看到断言被明显放松;主行为覆盖方向是对的,但新增测试必须先能在 PR head 上绿。


📎 完整 codex 原始输出(存档备查)

default raw artifact:

---
pr: 480
role: tests
head_sha: 5e5bb1e3d2a03208e2cd8ff0652456b6eef36c4e
verdict: reject
---

## Verdict
Reject: the new RuntimeRetention CLI tests are not passing on the PR head because the fixture uses the legacy `.refactor-loop/host.env` path that `LoopContext` explicitly refuses to read.

## Evidence
- `skills/codex-refactor-loop/scripts/test_runtime_retention.py:33` writes `REPO_ROOT` and `RUNTIME_RETENTION_ENABLE` to `.refactor-loop/host.env`, but `skills/codex-refactor-loop/scripts/codex_refactor_loop/context.py:158` says legacy `.refactor-loop/host.env` is not read.
- `skills/codex-refactor-loop/scripts/test_runtime_retention.py:49` runs the CLI with only `PATH` and `PYTHONPATH`, so `CONSENSUS_RND_HOST_ENV` is absent. On an extracted PR-head tree, `python3 -m unittest skills/codex-refactor-loop/scripts/test_runtime_retention.py` fails `test_default_disabled_noops_even_when_old_files_exist` and `test_cli_uses_host_env_and_reports_summary` with `FATAL: REPO_ROOT is unset; ... legacy .refactor-loop/host.env is not read`.
- I did not find added `sleep`/skip/manual-test markers or assertion loosening in the Python diff. The behavior tests otherwise cover delete scope, opt-in noop, symlink/FIFO preservation, pending-event compaction, worktree planner proof plus git recheck, command registration, aliasing, and authorization source-regression.

## What would change your verdict
Fix the `RuntimeRetentionBehaviorTests` fixture to create a host-owned `.config/consensus-rnd/host.env` and pass `CONSENSUS_RND_HOST_ENV=.config/consensus-rnd/host.env` in `run_cli`, then rerun `python3 -m unittest skills/codex-refactor-loop/scripts/test_runtime_retention.py`.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:480:tests:reject

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor Author

loning commented Jun 5, 2026

🤖 测试审查结论: reject(更正版)

TL;DR

  • 这是什么: PR 480 的测试质量审查,只看测试覆盖与测试本身是否可靠。
  • 现在到哪一步 / 结论是什么: reject,因为新增的 RuntimeRetention CLI 测试在 PR head 上实际失败。
  • controller 下一步: 需要修正测试 fixture 的 host.env 路径后重新跑对应测试。

详细说明

我在提取出的 PR head 上跑了 python3 -m unittest skills/codex-refactor-loop/scripts/test_runtime_retention.py, 结果 9 个测试中 2 个失败。失败点是 test_runtime_retention.py:33 把测试用的 REPO_ROOTRUNTIME_RETENTION_ENABLE 写进 .refactor-loop/host.env, 但当前 LoopContext 合同明确不读取这个 legacy path。test_runtime_retention.py:49 启动 CLI 时也没有设置 CONSENSUS_RND_HOST_ENV, 所以 CLI 按预期 fail closed, 报 REPO_ROOT is unset

修法很窄: 测试 fixture 应该创建 .config/consensus-rnd/host.env, 并在 run_cli 的 env 中传 CONSENSUS_RND_HOST_ENV=.config/consensus-rnd/host.env。我没有看到新增 sleep/skip/manual marker, 也没有看到断言被明显放松;主行为覆盖方向是对的,但新增测试必须先能在 PR head 上绿。


📎 完整 codex 原始输出(存档备查)
---
pr: 480
role: tests
head_sha: 5e5bb1e3d2a03208e2cd8ff0652456b6eef36c4e
verdict: reject
---

## Verdict
Reject: the new RuntimeRetention CLI tests are not passing on the PR head because the fixture uses the legacy `.refactor-loop/host.env` path that `LoopContext` explicitly refuses to read.

## Evidence
- `skills/codex-refactor-loop/scripts/test_runtime_retention.py:33` writes `REPO_ROOT` and `RUNTIME_RETENTION_ENABLE` to `.refactor-loop/host.env`, but `skills/codex-refactor-loop/scripts/codex_refactor_loop/context.py:158` says legacy `.refactor-loop/host.env` is not read.
- `skills/codex-refactor-loop/scripts/test_runtime_retention.py:49` runs the CLI with only `PATH` and `PYTHONPATH`, so `CONSENSUS_RND_HOST_ENV` is absent. On an extracted PR-head tree, `python3 -m unittest skills/codex-refactor-loop/scripts/test_runtime_retention.py` fails `test_default_disabled_noops_even_when_old_files_exist` and `test_cli_uses_host_env_and_reports_summary` with `FATAL: REPO_ROOT is unset; ... legacy .refactor-loop/host.env is not read`.
- I did not find added `sleep`/skip/manual-test markers or assertion loosening in the Python diff. The behavior tests otherwise cover delete scope, opt-in noop, symlink/FIFO preservation, pending-event compaction, worktree planner proof plus git recheck, command registration, aliasing, and authorization source-regression.

## What would change your verdict
Fix the `RuntimeRetentionBehaviorTests` fixture to create a host-owned `.config/consensus-rnd/host.env` and pass `CONSENSUS_RND_HOST_ENV=.config/consensus-rnd/host.env` in `run_cli`, then rerun `python3 -m unittest skills/codex-refactor-loop/scripts/test_runtime_retention.py`.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:480:tests:reject

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor Author

loning commented Jun 5, 2026

🤖 Quality review 拦截: authority token 与新测试夹具不一致

TL;DR


详细说明

我没有发现 RuntimeRetention 这个 skill-private 本地清理 helper 本身有过度抽象或命名泛化问题;核心问题是两处 touched surface 没对齐。

skills/codex-refactor-loop/scripts/codex_refactor_loop/cli.py:62revive-implements 仍声明 authority token 为 delete-log,但 skills/codex-refactor-loop/scripts/test_cli_command_router.py:24 已把闭合集合换成 delete-runtime。结果是 command authority fact source 已经不再闭合,test_every_command_declares_closed_authority_tokens 会在 ('delete-log',) 上失败。

skills/codex-refactor-loop/scripts/test_runtime_retention.py:33 把测试用 host env 写到 .refactor-loop/host.env,但当前 LoopContext 明确不读 legacy .refactor-loop/host.envrun_cli() 也没有传 CONSENSUS_RND_HOST_ENV。我从 PR head archive 跑窄测试时,test_default_disabled_noops_even_when_old_files_existtest_cli_uses_host_env_and_reports_summary 都以 REPO_ROOT is unset 失败。修法很小:像 test_log_retention.py 一样写 .config/consensus-rnd/host.env,并在 subprocess env 里传 CONSENSUS_RND_HOST_ENV


📎 完整 codex 原始输出(存档备查)
---
pr: 480
role: quality
head_sha: 5e5bb1e3d2a03208e2cd8ff0652456b6eef36c4e
verdict: reject
---

## Verdict
Reject because the PR leaves the public command authority catalog internally inconsistent and adds runtime-retention tests that cannot run under the current host.env locator contract.

## Evidence
- `skills/codex-refactor-loop/scripts/codex_refactor_loop/cli.py:62` still declares `COMMANDS["revive-implements"].authority` as `("delete-log",)`, while `skills/codex-refactor-loop/scripts/test_cli_command_router.py:24` replaces the closed authority token catalog with `delete-runtime`; the touched command router test fails with `('delete-log',)` outside `ALL_AUTHORITY_TOKENS`. This is not a naming nit: the public authority fact source no longer has a closed token vocabulary.
- `skills/codex-refactor-loop/scripts/test_runtime_retention.py:33` writes the fixture host env to `.refactor-loop/host.env`, but `run_cli()` at `skills/codex-refactor-loop/scripts/test_runtime_retention.py:49` does not set `CONSENSUS_RND_HOST_ENV` or source `REPO_ROOT`; both `test_default_disabled_noops_even_when_old_files_exist` and `test_cli_uses_host_env_and_reports_summary` fail with `FATAL: REPO_ROOT is unset... legacy .refactor-loop/host.env is not read`.

## What would change your verdict (only if comment or reject)
Change `revive-implements` to a valid closed authority token or keep `delete-log` in the catalog if it remains intentional, then update `test_runtime_retention.py` to create a host-owned `.config/consensus-rnd/host.env` and pass `CONSENSUS_RND_HOST_ENV` in `run_cli()` like `test_log_retention.py` already does. Re-run the narrow tests from the PR head.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:480:quality:reject

⟦AI:AUTO-LOOP⟧

@loning
Copy link
Copy Markdown
Contributor Author

loning commented Jun 5, 2026

🤖 关闭本 PR(stale + 实现有真实缺陷)— #437 转回 design-solving 重做

本 PR 是 #437(.refactor-loop 历史生成物/陈旧 worktree 的有界清理机制)的 stale-base 恢复尝试,但经 review + CI 确认实现有真实缺陷,非单纯 stale 噪音:

  • contract-tests 失败(真实测试失败,非 flaky)。
  • quality reject:public command authority catalog 内部不一致 + 新增的 runtime-retention 测试在当前 host.env locator 契约下无法运行(违反契约边界)。
  • 叠加 moving-target(集成分支高频前进,反复被挤 stale)+ 本特性为低优先级 janitorial。

结论:磨这个又 stale 又有设计缺陷的低价值 PR ROI 太低。关闭本 PR,把 #437 转回 design-solving 走一轮干净的 design-consensus 重新设计(带上本次 reviewer 的两点具体关切:command authority catalog 一致性 + retention 测试要兼容 host.env locator 契约),由共识产出正确实现,而不是续命这个分支。

⟦AI:AUTO-LOOP⟧

@loning loning removed the crnd:phase:reviewing review-gate reviewers in flight label Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

crnd:human:auto auto-advancing, no human needed crnd:lifecycle:managed loop-managed item crnd:phase:closed closed (terminal protocol phase)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant